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Abstract 

Background: The high demanding computational requirements necessary to carry out protein motion simulations 
make it difficult to obtain information related to protein motion. On the one hand, molecular dynamics simulation 
requires huge computational resources to achieve satisfactory motion simulations. On the other hand, less accurate 
procedures such as interpolation methods, do not generate realistic morphs from the kinematic point of view. 
Analyzing a protein's movement is very similar to serial robots; thus, it is possible to treat the protein chain as a serial 
mechanism composed of rotational degrees of freedom. Recently, based on this hypothesis, new methodologies 
have arisen, based on mechanism and robot kinematics, to simulate protein motion. Probabilistic roadmap method, 
which discretizes the protein configurational space against a scoring function, or the kinetostatic compliance method 
that minimizes the torques that appear in bonds, aim to simulate protein motion with a reduced computational cost. 

Results: In this paper a new viewpoint for protein motion simulation, based on mechanism kinematics is presented. 
The paper describes a set of methodologies, combining different techniques such as structure normalization 
normalization processes, simulation algorithms and secondary structure detection procedures. The combination of all 
these procedures allows to obtain kinematic morphs of proteins achieving a very good computational cost-error rate, 
while maintaining the biological meaning of the obtained structures and the kinematic viability of the obtained 
motion. 

Conclusions: The procedure presented in this paper, implements different modules to perform the simulation of the 
conformational change suffered by a protein when exerting its function. The combination of a main simulation 
procedure assisted by a secondary structure process, and a side chain orientation strategy, allows to obtain a fast and 
reliable simulations of protein motion. 
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Background 

To address functional requirements or interact with other 
biological molecules, proteins undergo structural changes 
of variable degree, varying between distinct overall con- 
formations, of which only some are usually determined 
experimentally (Le the activated and inactivated forms 
of an enzyme). This is caused by the difficulties in the 
obtention of X-ray quality diffracting crystals, and conse- 
quently, it limits the knowledge on the dynamic behavior 
of the biomolecular machinery in important biological 



^Correspondence: mikel.diez@ehu.es 

1 Faculty of Engineering in Bilbao, University of the Basque Country UPV/EHU, 
Department of Mechanical Engineering, Alameda de Urquijo s/n, 4801 3 
Bilbao, Spain 

Full list of author information is available at the end of the article 



processes. Accordingly, the comprehension of the inter- 
mediate steps is crucial to overcome these difficulties, and 
provide a useful tool to fill the gaps that escape to bench- 
dependent experimental approaches so far. However, 
protein motion simulation has always been a troublesome 
problem, mostly because of its high demanding com- 
putational requirements. Precise simulations based on 
molecular dynamics are usually limited to small molecules 
or to the use of supercomputers or distributed networks 
[1-3]. However, other procedures such as Ab initio or 
Rosetta methods do not provide information related to 
protein kinematics. This information is essential if we 
want to understand the mechanisms that proteins use to 
exert their motion and hence, their functions [4] . 
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Recently, and thanks to the information available related 
to protein science, new approaches have been proposed 
in the literature [5,6] to simulate protein motion. These 
approaches are not based on either quantum mecha- 
nisms, or biology related roots, but deal with mechanism 
and robot kinematics principles [5]. One of the main 
advantages of these new methodologies is their small 
computational cost. One of the first applied methods 
is Probabilistic Roadmap Method (PRM) [6-11]. This 
method consists in discretizing the protein configura- 
tional space. Then, each position is evaluated against a 
scoring function (force field, empirical, etc) and it is con- 
sidered either correct or incorrect. Once every position 
has been checked it is possible to trace a path connecting 
neighboring positions to obtain the protein motion. The 
PRM is used on a wide variety of protein motion studies. 
In [12,13] it is proposed to use this approach to the sim- 
ulation of ligand-protein interaction. In those works, it is 
proposed to consider the degrees of freedom of the ligand, 
as well as some degrees of freedom of the protein (mostly 
side chains related degrees of freedom). In [6], a similar 
approach, considering only the ligand degrees of freedom, 
is proposed. In [14], it is studied how restrictions to the 
possible motions of the protein backbone affect the search 
algorithms used for PRM. Although this approach yields 
quite good results, the need of computing all possible con- 
figurations of the protein structure makes this procedure 
computationally costly, especially for big proteins. 

One important approach used for protein simula- 
tion, is related to the Normal Mode Analysis (NMA) 
implemented in mechanisms and robotics. This anal- 
ysis provides information related to vibrational modes 
of mechanisms, useful for the dynamic analysis of their 
structure [15]. This approach, computationally much less 
expensive than PRM, may be applied to protein struc- 
tures [16,17]. Using this methodology, information related 
to possible movements of a protein structure around its 
current configuration is obtained. Thus, although com- 
putationally less expensive, the information provided is 
not complete. New procedures combine both NMA and 
PRM to obtain large conformational changes in proteins 
[18]. In this approach NMA results are used to guide 
the PRM algorithms and reduce their computational cost. 
Kinetostatic compliance method [19,20] makes use of sev- 
eral kinematic theories to simulate protein motion. Firstly, 
it takes advantage of zero notation [21] to simplify pro- 
tein structure definition during the procedure. Secondly, 
It also implements ball and rods modelization, consider- 
ing both bond lengths and angles as constant. To execute 
the simulation process the protein potential energy field is 
transformed into equivalent forces and torques applied to 
the protein chain. Basing on the applied torques, it calcu- 
lates the dihedral angle increments for the next step of the 
simulation. 



In this paper, we present a new methodology based on 
our previous works [22,23] for protein motion simulation. 
The objective of the procedure is to morph a protein from 
one known configuration to another known one, provid- 
ing reliable and quick information in relation to protein 
kinematics and movement with a very low computational 
cost, (low enough to be used on a normal PC). The simula- 
tion procedure presented in this paper is composed of four 
independent strategies. The first one consists in a normal- 
ization procedure aimed to homogenize equivalent bond 
distances and angles in the protein structure [22]. The 
second one is a main simulation procedure entrusted to 
advance in the simulation obtaining valid structures [23] . 
A third strategy is a procedure intended to reduce pro- 
teins potential energy by changing side chains orientation. 
And the final one detects secondary structures among 
the protein chain. The novelty of the proposed proce- 
dure consists, on the one hand, in the implementation 
of the side chain orientation strategy and the secondary 
structure detection method, and on the other hand, in 
the simultaneous combination of the aforementioned four 
strategies. Consequently, the approach provides a com- 
putationally efficient simulation tool for protein motion 
simulation. To validate the results, three indicators are 
measured through the simulation process: (i) backbone 
atom root mean square deviation to compare obtained 
structures global similarity, (ii) Ramachandran plots, to 
ensure proteins biological nature and (Hi) proteins poten- 
tial energy to verify that no steric clashes have occurred 
during the simulation [24]. 

Methods 

Preparing protein structure for kinematics modeling 

Ball and rods models provide model structures valid to 
apply mechanism theorems for protein simulation [21]. 
Most protein structure models whose target is protein 
simulation incorporate some simplifications. Ca meshes 
are used in [25-27] to produce a reduced model with 
an acceptable computational cost. Rigid bond and angle 
approaches are used in the same way in [28,29]. Nor- 
mal mode analysis (NMA), usually mixes rigid bonds with 
springs to produce the structure needed for modal anal- 
ysis [17,30]. Side chains are also simplified in various 
ways to reduce the computational cost associated to them. 
In [8] it is proposed to treat side chains as spheres fill- 
ing an equivalent space. In [28], the authors propose to 
adjust the size of side chains to reduce its influence on the 
simulation, and later on, resize them progressively. 

In this paper we propose an all atom model, based on 
ball and rods approach, in which some simplifications 
are considered to reduce the overall computational cost. 
In particular, Protein degrees of freedom are reduced to 
rotations around the dihedral angles. Every other possible 
atom movement resulting from bond stretching or non- 
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proper torsions is despised. Besides the relative position 
among peptide plane atoms is maintained constant during 
the simulation process. The peptide bond angle go value is 
also limited to 0° or 180°. Regarding side chains, accord- 
ing to the proposed all atom model, every atom should be 
taken into account. In relation to side chains' degrees of 
freedom, to reduce the computational cost of the process, 
only the rotation around the Ca — Cfi bond is considered. 

As previously stated, the procedure simulates protein 
motion between two known structures. Most structural 
data for protein simulation come from experimentation. 
Thus, these data need to be compared with the pro- 
posed model to verify that all the hypotheses are fulfilled. 
One of the major drawbacks of trying to use kinematic 
theories to protein simulation is the difference between 
proteins and mechanisms structures. In mechanism kine- 
matics the linkages do not change their structural form 
or characteristics during the movement, unless the objec- 
tive of the mechanism lies precisely in that requisite 
[31]. However we must bear in mind that proteins are 
composed by atoms that atoms that are bonded by elec- 
tromagnetic and covalent forces, thus, in the case of 
proteins, there is no need to maintain constant neither 
bond length nor bond angles during the conformational 
change. 

Therefore a normalization procedure is applied to 
experimental data with the purpose of homogenizing 
equivalent bond distances and angles in the protein struc- 
ture. The approach is based on two different normal- 
ization processes, peptide plane normalization and bond 
length normalization. 

Peptide planes are normalized to get exactly 0° or 180° 
peptide bond angles. This complies with the proposed 
hypothesis of rigid peptide planes among the simulation 
[23]. The objective of normalizing the peptide planes is to 
assign to a>i the angles 0° or 180°, as it is proposed in the 
kinematic model. To do this, Cat and Ccu/+i atoms' coor- 
dinates are fixed. Then, using the least square method, 
the mean plane is calculated using the local coordinates of 
Cat, Cu Oi,Ni+i and Coti+i atoms. Fixing Cat and Cat+i 
atoms' coordinates allows to maintain protein backbone 
continuity during the normalization process, as the nor- 
malization of one peptide plane does not alter the one that 
has been already normalized. 

Local reference systems to determine relative locations 
of amino acids atoms have also been used in [32,33]. Here, 
Cat is selected as the origin of the local coordinate sys- 
tem. The straight line connecting Cat and Ca^i defines 
\\i axis direction, w; axis direction is defined by the per- 
pendicular direction to the plane containing Cat and Q 
atoms and u; axis. Finally, v/ completes the right-handed 
reference system. The advantage of using this local refer- 
ence system, is that the middle plane will always contain u; 
axis, so the least square problem is reduced to the calculus 



of the slope of this plane. This is made by the following 
formula: 

S = (wQ-a • v Ci ) 2 + (w 0i -a • v 0i ) 2 + (w Ni+l -a • v Ni+l ) 2 

(1) 

where vy, Wj are the local coordinates of Q, O/, atoms 
and a is the slope of the plane to be calculated. To find 
the value of a, Eq. (1) is differentiated with respect to a 
obtaining the result in Eq. (2): 



Wq • vq + wot • VOt + w Ni+1 • v Ni+ 



vl. + v 2 . + v 2 



(2) 



N i+ i 



Once the slope of the plane has been defined, atoms Q, 
Oi and Ni+i are projected onto it (Figure 1). 

In a second stage, distance constraints are applied to 
ensure that every bond length is identical between two 
same type of atoms. To apply these distance constraints, 
we need to define standard bond lengths. For this matter, 
reference bond lengths from the AMBER force field are 
used [34]. Distance constraints are sequentially applied 
from the fist atom of the protein chain to the last one. To 
apply the distance constraint, the formulation proposed 
in [35,36] is used. Thus, using ; as a reference atom, the 
distance constraint is applied to the bond between ; and i 
atoms according to the following expressions: 



: dji 



(r?-r/)x(r, 1 -r/)=0 



(3) 
(4) 



where rj and r? represent the position vectors of the 
atoms j and i respectively before the application of the cor- 
responding distance constraint, whereas r] is the vector 
determining the new position of atom L dji defines the the- 
oretical mean value of the bond length. Once the distance 
constraint has been applied to i atom, every subsequent 
atom of the protein chain is translated by (r* — r?). 

As Eq. (3) defines a sphere of radius dji it is necessary 
another constraint to univocally determine the position 
of i atom. This new condition determines that the new 
position fo i atom must lie on the straight line defined 
by the actual positions of i and / atoms. In Figure 2, the 




Figure 1 Normalization of the peptide planes. 



Diez etal. BMCBioinformatics20'\4 l 15:184 
http://www.biomedcentral.eom/1 471-21 05/1 5/1 84 



Page 4 of 14 




application of the distance constraint to Q (j atom) and 
JV/+i (/ atom) is represented. 

To test the viability of the normalization process, 
Type-C PPase protein from S. gordonii is used, which 
is supposed to suffer a conformational change from a 
"closed" (PDB lk20) to an "open" (PDB lk23) structure 
upon binding of pyrophosphate (PPi). SgPPase exists as 
an homodimer in solution, in which each subunit is com- 
posed of two distinct domains. The N-terminal domain 
consists of a six-stranded parallel b-sheet flanked by 
extended loops on one side and a helical subunit on 
the other one. In the absence of ligands, the C-terminal 
domains occlude the corresponding active sites. Bind- 
ing of PPi or substrate analoges to the active site of the 
N-terminal domain presumably triggers a conformational 
change that liberates the inhibition imposed by the C- 
terminal region [37]. Three strategies have been tried, 
applying distance, planes and sequential (planes and next 
distance) normalization processes. The results for the nor- 
malization of the initial and final reference conformations 
of the movement are shown in the Table 1 where they are 
compared against the protein before the normalization. 

Table 1 shows that the normalization processes have 
not altered the global structure of the proteins, the higher 
root mean square deviation being only 0.37 A. Regarding 
potential energy variation, length normalization process 
reduces its value in both lk20 and lk23 proteins. How- 
ever, the proteins' potential energy value is increased by 
3% on lk20 protein and by 1% on lk23 protein on the 
planes normalization process. This effect is the result of 

Table 1 Results of the normalization processes 

rmsdi {A) 

Normalization 

1k20 1k23 



projecting the atoms onto the calculated middle plane, 
producing the bonds distances change. Finally, sequential 
normalization process reduces this effect by first applying 
peptide plane normalization and next length normaliza- 
tion. As stated before, every distance constraint displaces 
the subsequent atoms of the protein chain, thus not alter- 
ing already normalized peptide planes. In every normal- 
ization process, Ramachandran plot values indicate that 
the normalized protein structures have always maintained 
their biological meaning. After analysing these results, 
from these three normalization processes the sequential 
one has been chosen. This normalization process fits the 
experimental data with the proposed model of the protein 
structure. Additionally, the existence of normalized pep- 
tide planes allows to exclude the peptide bond angle go as 
a degree of freedom. 

Simulation procedure for dynamic dihedral angle 
increments adjustment 

The main target of the methodology described in this 
paper is to simulate the protein motion between two 
known positions. To that aim, we have simulated the con- 
formational changes suffered by three different proteins, 
(i) Type-C Inorganic Pyrophosphatases; (ii) Troponin C; 
and (iii) Calmodulin, ranging from the displacement of 
an entire domain to the reorientation of few secondary 
structure elements. The proposed model of the protein 
structure permits considering only the dihedral angles as 
proteins degrees of freedom to produce the motion. This 
configuration resembles the protein with a very long serial 



RP (atoms in 

Energy (Kcal/mol) preferred regions) 

1k20 1k23 1k20 1k23 



Lengths 0.36 0.18 -12.6% -13.7% 94% 92% 



Planes 


0.17 


0.08 


3% 


1% 


92% 


91% 


Sequential (P+L) 


0.37 


0.21 


-10% 


-8.6% 


93% 


91% 
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robot. Thus, the motion of the protein can be defined as 
a sequence of incremental steps applied on these dihedral 
angle values, from the initial to the final conformations. 
Data structures of the proteins under study are taken 
from the Protein Data Bank (PDB), which are used as 
input data for the procedure. The simulations were car- 
ried out using a software developed by our research group 
called GIMPRO [38]. Several options are implemented 
in the software as it is stated subsequently. The soft- 
ware is able to read and save protein data with .pdb 
file format. It also produces text files with the evolu- 
tion of both studied proteins potential energy and rmsd 
evolution. Finally the software creates a .pdb file con- 
taining the obtained simulation. This file is compatible 
with other visualization programs like PYMOL in order 
to produce higher quality renderizations of the protein 
movement. 

The process uses dihedral angles increment data, A0/ 
and Ai/si, to complete the simulation. This data is obtained 
by calculating dihedral angle values of the final conforma- 
tion (df t and and subtracting dihedral angle values 

of the initial conformation (0? and xjrf) as stated on 
Eq. (5). The total increment in each dihedral angle is 
divided by p, the number of intermediate conformations 
to be computed. For each dihedral angle, the direction 
that requires the minimum angular increment to reach 
the final position is defined as its starting rotation 
direction. 

A0/ = — ; Ax/fi = l - (5) 

p p 



To assess the quality of the obtained structures, root 
mean square deviation (rmsd), Ramachandran plots [39] 
and potential energy will be evaluated during the simula- 
tion process. Regarding the root mean square deviation, 
backbone atoms are considered for its calculus. To eval- 
uate proteins' potential energy, AMBER potential force 
field, with the parameters proposed by Cornell [34] has 
been chosen. The use of the three indicators ensures the 
global similarity between the structures (rmsd), the non- 
existence of steric clashes (potential energy) and the bio- 
logical sense of the obtained structures (Ramachandran 
plots). Intermediate data structures for rmsd comparison 
are obtained from the Morph server [40] . 

In Figure 3 the flux diagram of the simulation procedure 
is shown. The first step is to calculate proteins initial con- 
formation potential energy E°. To obtain the next proteins 
conformation, the procedure starts rotating the dihedral 
angles of the protein in sequential order, from the first 
amino acid to the last one. One important characteris- 
tic of the process is that it stores the increment produced 
by each rotation on the proteins potential energy as AE ] -, 
where k is the actual step number. Once the process 
has finished the actual steps proteins' potential energy is 
obtained (E k ). 

To avoid steric clashes, the process checks if E k has 
exceeded an admissible threshold. This energy threshold 
is defined by and the proteins potential energy value 
E°. In every iteration, E k value must be below E° + E° • e^. 
To calculate 6/ c the following formula is used: 



Rotate i m DoF 
A\|/i,A(pj 



Calculate AE* 



i=i+l 



has 

' every DoF N 
vbeen rotated?/ 



Calculate new E 



yes 



Figure 3 Algorithm for the /c-th step of the procedure. 



Undo i th DoF 
rotation related to 
the highest AE k t 



Rotate i th DoF with 
a reduced increment 
value Ai^/h, Aq^/h 



Calculate E k 



, k -E°/E°>e k 


yes 










1 no 






END 










Undo i th DoF 
rotation A\j/ -J2,kq> J2 



if i th DoF has been 
blocked m times 
A\|/j = -A\j/i Acpi = -Acpj 
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Figure 4 Side chain Ca,— C/?,- bond rotational degree of freedom. 



where e determines the maximum change percentage in 
the proteins potential energy. As defined before, p is the 
total number of steps of the simulation. If E k has exceeded 
the imposed threshold for the current step, the proce- 
dure detects the dihedral angle that has generated the 
higher energy increment. Once detected, the applied rota- 
tion is rolled back. The procedure then applies a new 
rotation with a reduced increment value (A1//7/2, A0//2). 
Again, proteins potential energy is calculated and checked 



against the admissible threshold. If the threshold is 
exceded, the new reduced rotation is undone and the dihe- 
dral angle is blocked for the current step. This process is 
repeated for each dihedral angle of the protein chain. 

During the simulation, the number of times each degree 
of freedom is blocked is saved. When this number reaches 
an m value (user defined), the procedure considers that 
the dihedral angle cannot rotate in its assigned direc- 
tion, changing the rotation direction of this dihedral angle 
(A1//7 = — Ax/si, A(pi = —A(j>i) for the next n steps (user 
defined). Each dihedral angle is independently studied by 
the procedure, each one with its own record of m and n 
parameters. 

As the procedure has the capacity to change both dihe- 
dral angle increment values and rotation direction, once 
per step, if a degree of freedom has been blocked, p is 
incremented by 1. This is done with the purpose of bet- 
ter adjusting the energy tolerance of each step which is 
calculated as a function of p (see Eq. (6)). 

Procedure for side chain orientation 

In the strategy on charge of orienting the side chains, the 
main objective is to achieve a low computational cost. 
Therefore, in this first stage, only the rotational degree 
of freedom of Cat — Cfii bonds will be considered (see 
Figure 4). 

Another target is to reduce the potential energy 
value of the protein. This objective may lead to an 
energy minimization algorithm that could produce an 
unacceptable increment on the computational cost. To 
avoid this problem, as it will be explained subsequently the 
proposed procedure is based on the guidelines shown in 
Figure 5. 

The procedure starts calculating the optimal rotation 
direction for each side chain. To do so, each side chain 
is rotated in both possible directions calculating the 




Figure 5 Side chain orientation diagram. 
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Clasify amino acid as 
candidate or stable 




produced energy increments E\ and E" in the process. The 
selected rotation direction is the one that has produced 
the higher potential energy decrement, or if both rotation 
directions have increased the potential energy value, the 
direction that has produced the lesser energy increment is 
selected. Accordingly the rotation direction for each side 
chain is independently defined. 

Once the rotation directions are defined, the proce- 
dure proceeds to rotate 2° each side chain in its optimal 
direction. The value has been defined to ensure that no 
displacement on the edges of long side chains produces 
steric clashes. After this rotation, it is checked that if the 
obtained proteins potential energy E k is lower than the 
main simulation process produced structures one (E k ). 
On the contrary, the process undoes the rotations and new 
rotation of 1°. Again, obtained proteins potential energy 
E k is compared with E k . If again, E k is higher than E k , 
the process undoes the applied rotations and rotates only 
those side chains that have produced a decrement on the 
proteins potential energy. After the last rotation the values 



of ^ and£* are checked again, and in the case that E K 
is higher than E k last applied rotations are undone, leav- 
ing the structure in the initial stage before starting the side 
chains orientation procedure. 

It could be considered unnecessary to validate the last 
check between E k and E k since only those side chains that 
have produced a decrement on proteins potential energy 
have been rotated. However, the latter verification must be 
done because each side chain rotation direction has been 
independently calculated, without considering the influ- 
ence of neighboring side chains rotations. A decrement 
on the potential energy due to the rotation of a single side 
chain does not ensure that when every other side chain is 
rotated that decrement is maintained. 

Procedure for secondary structure detection by dihedral 
angle parameters evaluation 

The procedure presented in this paper to detect secondary 
structures uses the dihedral angle values, which are 
obtained in any simulation program. A previous step to 



Table 2 Results of the procedure for the detection of secondary structures in the selected proteins 





Molecular 


% of detected residues 




%of residues in 


Protein 


mass (Da) 


in secondary structures 


secondary structures (PDB) 






a-helix /?-sheet 


a-helix 


i8-sheet 


1zac 


9.98 


64.4 3.3 


70 


6 


1k9p 


10.27 


54.4 6.6 


60 


6 


3cln 


16.88 


54.5 6.2 


59 


5 



1k20 



67.75 



35.2 



24.5 



40 



20 



2peq 30.56 80.5 0 72 0 



4fkx 


55.13 


34.8 


19.7 


42 


16 


3sza 


1 04.76 


34 


22 


44 


16 
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carry out this strategy is to perform a classification of 
each amino acid attending to their dihedral angle values 
as follows: 

• Candidate: an amino acid is considered as a candidate 
when its dihedral angles are inside a zone of the 
Ramachandran plot belonging to a secondary 
structure. 

• Stable: an amino acid is considered stable when the 
procedure has checked that it belongs to a secondary 
structure. 



Obtain increments 
Acpi, A\|/i 



Calculate E° 



I 



Secondary structure 
detection module 



i 



kth step 
simulation 
procedure 



k = k+l 



Secondary chain 
reorientation 
module 



no 




END 



Figure 7 Simulation process diagram including side chain 
orientation module and secondary structure detection module. 



• Unstable: an amino acid is considered unstable when 
it cannot be classified as candidate or stable. 

To define an amino acid as a candidate, a tolerance has 
been incorporated with respect to the theoretical values 
of the dihedral angles. This tolerance has been adjusted to 
obtain the best possible results, obtaining a final value of 
30° for each type of secondary structure. 

Once the process has classified each amino acid as can- 
didate, stable or unstable, it is possible to start searching 
for secondary structures. In an initial step, the first candi- 
date amino acid of the protein chain is found and selected. 
After that, it is checked if the two subsequent amino acids 
are candidates of the same type of secondary structure. 
In this case, the three amino acids are classified as stable 
amino acids. The process continues checking if the next 
amino acids belongs to the same type of secondary struc- 
tures, until an unstable amino acid or an amino acid of 
another type of secondary structure is found. This process 
is repeated until every amino acid of the protein has been 




Figure 8 Initial (a) and final (b) positions of type C inorganic 
Pyrophosphatase (family II) from Streptococcus gordonii 
protein. The movement is given by the aperture of the protein, 
similar to a crab clamp. Represented with Pymol. 
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checked. In Figure 6 a diagram of the secondary structure 
detection process is shown. 

The use of a high tolerance resulted in unsatisfactory 
results, specifically on coil parts of the protein structure. 
As stated, one of the conditions for an amino acid to 
be set as a stable amino acid is to be part of a chain of 
three consecutive amino acids of the same type of sec- 
ondary structure.This condition reduces the detection of 
secondary structures on coils of the protein structure, 
allowing to maintain the tolerance value. 

To test the results seven proteins have been analyzed. 
The proteins under study have allowed us to validate 




Figure 9 Initial (a) and final (b) positions of 1 zac protein. The 

movement is given by displacement of the two helixes of the right 
part of the protein. Represented with Pymol. 



the procedure with different protein sizes and different 
secondary structure distributions. The obtained results 
have been compared with each proteins' structural data, 
available on the Protein Data Bank (PDB). The results are 
shown on Table 2. 





Figure 1 0 Initial (a) and final (b) positions of 3cln protein. The 

movement is given by the formation of a central a-helix. Represented 
with Pymol. 
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Table 3 1k20 Protein results 



Type of 
simulation 


rmsd {A) 


Energy (%) 


RP (% of atoms inside 
preferred zones) 


Step 
duration 


Simulation 
duration 


Type 1 


7.4 


5.4 


93 


43 min 


44 h 


Type 2 


5.97 


4.4 


93 


43 min 


37 h 


Type 3 


6.27 


5.2 


93 


31 min 


21 h 



It can be conclude that the strategy requires a very low 
computational cost, needing only 8ms to complete the 
secondary structure detection on the biggest protein, 3sza. 
Thus, its implementation on the simulation procedure 
does not increase the overall computational cost. 

Once both procedures have been explained, it is essen- 
tial to assess how to implement them on the simula- 
tion process. The aim of developing both processes, side 
chain orientation and secondary structure detection pro- 
cedures, is to keep them independent from the main 
simulation process. This way, if any of them fails to 
achieve its objective, i.e., detecting no secondary structure 
or not reducing the protein potential energy value, the 
main simulation process can continue. For this reason, the 
proposed sequence of the strategies is shown in Figure 7. 

As it can be seen in Figure 7. the secondary struc- 
ture detection procedure is executed prior to the main 
simulation process. This allows to extract the degrees 
of freedom already located on secondary structures, 



therefore, reducing the number of degrees of freedom 
from the first step. On the other hand, side chains ori- 
entation procedure is carried out after the main simu- 
lation process. In the simulation process it is required 
to obtain a valid structure prior to reducing its energy 
making use of the side chains orientation procedure. If 
the latter procedure successes on reducing the proteins 
potential energy value, then the main simulation pro- 
cess has a higher amount of available energy for the next 
step. 

Results 

Each of the simulation strategies will be applied to three 
different proteins. The first protein, type C inorganic 
Pyrophosphatase (family II) from Streptococcus gordonii 
protein, pdb entry lk20 (see Figure 8). With the pro- 
posed modelization, the protein has 4732 atoms with 604 
degrees of freedom. The next protein is the Troponin C 
protein, pdb entry lzac (see Figure 9). Again, applying the 




Figure 1 1 Snapsots of the simulation of 1 k20 protein motion simulation, (a) Initial position, (d) final position and (b) and (c) intermediate 
positions. 
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Table 4 Izac Protein results 

Type of „ RP (% of atoms inside Step Simulation 

simulation rmsd {A) Energy (%) preferred zones) duration duration 

Typel 3.15 3.8 98 121s 64 min 

Type 2 3.04 5.9 97 119 s 79 min 

Type 3 2.52 4.4 96 97 s 87 min 



proposed modelization the protein has 1347 atoms and 
176 degrees of freedom. Finally, the last protein is the 
Calmodulin protein, pdb entry 3cln (see Figure 10) with 
2201 atoms and 284 degrees of freedom. To test the afore- 
mentioned strategies the following type of combinations 
are proposed: 

• Type 1: General simulation procedure and the side 
chain orientation strategy will be applied. 

• Type 2: The three simulation strategies are applied. 

• Type 3: The same as Type 2 but with one exception: 
the degrees of freedom of side chains located on 
secondary structures will be blocked. 

To select the parameters, the results obtained in previ- 
ous simulations made with the main simulation process 
[22,23] have been considered. In particular, the simulation 
parameters have been set as follows: p = 100, s = 10%, 
m = 3, n = 1. Simulations are run in a PC under Win- 
dows XP, with a pentium core 2 duo 2.13 GHz making use 
of a single core. 

In Table 3 the results for lk20 protein are shown. As it 
can be observed we obtain a minimal rmsd value of 5.97 A 
with the Type 2 simulation strategy. The Ramachandran 
plot values indicate that the biological meaning of the pro- 
tein has been ensured in all simulation strategies. Energy 
values also indicate the non-existence of steric clashes on 
the protein structure. Regarding the computational cost of 
the process, it needs 43 minutes on both Type 1 and Type 

2 simulation strategies and 31 minutes on the Type 3 sim- 
ulation strategy to obtain each intermediate structure of 
the movement. It can be concluded that the computation- 
ally most expensive procedure is the side chain orientation 
one. This fact is corroborated with the results of Type 

3 simulation. In this simulation strategy, the degrees of 
freedom of side chains located on secondary structures 
are extracted from the main simulation. In Table 3 it 
can be seen that the computational time is reduced by 



38% for each intermediate structure. In Figure 11 several 
snapshots of the simulation are shown. 

Results for lzac protein are shown in Table 4. In this 
protein minimum rmsd error is obtained on Type 3 
simulation strategy with 2.52 A Again Ramachandran 
plot values confirm the biological meaning of interme- 
diate structures. Energy values also indicate that steric 
clashes have been avoided during the simulation proce- 
dure. Lastly, regarding the computational cost of the sim- 
ulation for this protein, the procedure has needed about 
2 minutes to obtain each intermediate structure of the 
motion, completing the simulations in about 80 minutes. 

Finally, the results for 3cln protein are shown in Table 5. 
The simulation procedure obtains a rmsd error of 6.34 A 
on Type 1 simulation strategy. Again the Ramachandran 
plot value ensures the biological meaning of the obtained 
intermediate structures and energy values indicates that 
no steric clashes have been produced. As it can be seen, 
neither Type 2 nor Type 3 simulation strategies are able 
to achieve a valid simulation. In this particular protein 
morph, secondary structures degrees of freedom (which 
have been extracted from the main simulation) are needed 
to form the central a -helix. This fact causes the failure of 
the aforementioned simulation strategies. 

The superpositions between obtained final structures 
(in red) and data structures (in green) are presented in 
Figure 12. 

Discussion 

The molecular mechanisms underlying the activity of 
many proteins involve conformational transitions by 
hinge-bending, which involves the movement of rela- 
tively rigid parts of preserved geometry about flexible 
joints. The detection of both, the rigid domains and the 
ginger regions, has been largely studied during the last 
decades [41,42]. The graphical representation of these 
dynamic events is not only essential to understand how 



Table 5 3cln Protein results 

Type of „ RP (% of atoms inside Step Simulation 

simulation rmsd W) Energy (%) preferred zones) duration duration 

Typel 6.34 2.7 90 301s 300 min 

Type 2 - - - 298 s 

Type 3 - - - 225 s 
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Figure 12 Superpositions of the final structures obtained on the 
simulation process (red) with the data structures (green) for 
1k20 protein (a), Izac protein (b) and 3cln protein (c). 



the processes take place, but it helps to unravel mechanis- 
tic aspects that are difficult to visualize by less intuitive 
indirect approaches. 

One of the main limitations on the simulation of these 
conformational transitions is the huge computational cost 



of actual simulation strategies. Actual approaches struggle 
to obtain morphs that fulfill both biologic and kinematic 
requisites. The procedure presented in this paper satisfies 
both requirements with a good cost-error relation in the 
studied proteins. 

All the procedures presented in the paper make use of 
the same protein model. In this model bonds are con- 
sidered as rigid links and the degrees of freedom have 
been limited to rotations around dihedral angles and 
each side chains first bond rotation. These hypothesis 
produces a computationally very efficient model of the 
protein structure with the sufficient mobility to simulate 
conformational changes. 

Current work is focused on the simulation of confor- 
mational changes on dimers which posses similar hinge- 
bending mechanisms. The low computational cost of the 
presented procedures may be enough to deal with these 
types of molecules. 

Conclusions 

A recurring question in the analysis of molecular mecha- 
nisms underlying the regulation of a protein deals with the 
possible structural routes that let it evolve between two 
different conformations. The identification of these path- 
ways sets the structural basis for the rational design of 
molecules that act as modulators (activators or inhibitors) 
of the proteins activity, becoming an essential tool for the 
development of new drugs in the pharmaceutical industry. 

Few alternatives arise at the time of simulating 
protein motions. Low computational cost methodolo- 
gies offer fast results but usually with kinematically 
senseless trajectories with impossible backbone move- 
ments between consecutive positions of the simulation. 
These methods are based on restrained interpolations 
of atoms coordinates [40], and rely on intermediate 
energy minimization processes to solve the steric clashes 
produced during the simulation. On the other hand, sim- 
ulation procedures based on molecular dynamics need 
huge computational resources to complete successful sim- 
ulations in an acceptable spam of time. Software packages 
like GROMACS [43,44], NAMD [45], AMBER [46] etc. do 
not only require shared computing architectures, but due 
to their complexity its use is limited. 

The procedure proposed in this paper offers a fast 
and reliable method to obtain the motion of the protein. 
The procedure runs on a single processor and is fond 
for further improvement by implementing simple dis- 
tributed computing algorithms. This procedure maintains 
the kinematic continuity of the movement and ensures the 
biological sense of the obtained structures. 

The presented procedure has been implemented in a 
new bioinformatic package with the aim of facilitating 
the comprehension of the processes by which biolog- 
ical machines perform their function. The simulation 
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strategies described herein help the user to understand 
the behavior of these mechanisms. The described pro- 
cedures require, at least partially, the availability of the 
initial and final conformations adopted by the biological 
machine under analysis. In this regard, the validation indi- 
cators implemented in the proposed simulation processes, 
help to overcome the lack of knowledge in protein struc- 
tures by providing a modeling tool to reconstruct the fold 
of a target protein from homologous molecules in other 
organisms. Moreover, it may also help in deciphering the 
molecular mechanisms underlying metabolic processes, 
signaling pathways or transport events, as well as in map- 
ping specific "conformational routes" that characterize 
the dynamic behavior of a promiscuous protein motif 
(i.e cystathionine beta synthase (CBS) domains), that 
undergoes different structural changes upon binding dis- 
tinct types of ligands (see [47-49]). It should not be 
neglected the capacity of our software to improve struc- 
tural search models in molecular replacement methods 
during the elucidation of novel crystal structures by X-ray 
diffraction techniques. 
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